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Human cellular protein patterns and their link to genome DXA ^ 
sequence data: usefulness of two-dimensional gel 
electrophoresis and microsequencing 

JULIO E. CELIS.'-' HANNE H. RASML'SSEN.- HENRIK LEFFERS.* PEDER MaDSEN * BENT HONOR 1 - * 
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ABSTRACT Analysis of cellular protein patterns by 
computer-aided 2-dimensional gel electrophoresis together 
with recent advances in protein sequence analysis have 
made possible the establishment of comprehensive 
2-dimcnsional gel protein databases that may link pro- 
tein and DXA information and that offer a global ap- 
proach to the study of the ceil. Using the integrated ap- 
proach offered by 2-dimensional gel protein databases it 
is now possible to reveal phenotype specific protein (or 
proteins), to microsequence them, to search for homology 
with previously identified proteins, to clone the cDNAs, 
to assign partial protein sequence to genes for which the 
full DXA sequence and the chromosome location is 
known, and to study the regulatory properties and func- 
tion of groups of proteins that are coordinate!)* expressed 
in a given biological process. Human 2-dimensional gel 
protein databases are becoming increasingly important in 
view of the concerted effort to map and sequence the en- 
tire genome. Celis, J. E.; Rasmussen, H. H.; Leffers, 

H.: Madsen. P.; Honore, B.; Gesser, B.; Dejgaard, K.; 
Vandekerckhove, J. Human cellular protein patterns and 
their link to genome DNA sequence data: usefulness of 
two-dimensional gel electrophoresis and microsequencing. 
FASEB J. 5: 2200-2208; 1991. 

Kev Words human protein patterns * 2 'dimensional gel protein 
database j • gene expression * microsequencing • cDS'A cloning 
' unking prouxn and DXA information • genome mapping and se- 
am-nemo 



Proteins synthesized from information contained in the 
DXA orchestrate most cellular functions. The total number 
o: proteins svnthesized by a typical human cell is unknown 
although current estimates range from 3000 to 6000. Of 
these, as many as 709c may perform household functions 
and are expected to be shared by all cell types irrespective of 
their origin. There are many different cell types in the hu- 
man bodv with perhaps 30.000 to 50.000 proteins expressed 
in the organism as a whole judged from the fact that about 
^ oi the haploid genome correspond to genes. Todav only 
a small fraction of the total set of proteins has been identified, 
and little is known about the protein patterns of individual 
cell types or their variation under physiological and abnor- 
mal conditions. 

For the past 15 years, high resolution 2-dimensional gel 
electrophoresis has been the technique of choice to deter- 
mine the protein composition of a given cell type and for 
monitoring changes in gene activity through quantitative 
and qualitative analysis of the thousands of proteins that or- 
chestrate various cellular functions (refs 1-6 and references 



therein). The technique originaliv described bv O'Farrell : 
separates proteins in terms oi" their isoelectric point ^ph a:, 
molecular weight. Usually one chooses a condition of in- 
terest and the cell reveals the global protein behavioral 
response as all detected proteins can be analyzed both 
qualitatively and quantitatively in relation to each other. A: 
present, most available 2-dimensional gel techniques ^reeu- 
lar gel format) can resolve between 1000 and 2000 proteins 
from a given mammalian cell type, a number that cor- 
responds to about 2 million base pairs of coded DNA. Le>- 
abundant proteins can be detected bv analyzing paniall 
purified cellular fractions. 

Two-dimensional gel ectrophoresis has been widely applied 
to analysis of cellular protein patterns from bacteria to mam- 
malian cells (refs 1-6. and references therein). In spite of 
much work, however, information gathered from these 
studies has not reached the scientific community in its full- 
ness because of lack of standardized gel systems and the lack 
of means for storing and communicating protein informa- 
tion. Only recently, because of the development of appropri- 
ate computer software (7-13). has it been possible to scar 
gels, assign numbers to individual proteins, and store tht 
wealth of information in quantitative ancl qualitative com- 
prehensive 2-dimensional gel protein databases (4, 14-23). 
i.e.. those containing information about the various proper- 
ties (physical, chemical, biological, biochemical, physiologi- 
cal, genetic, immunological, architectural, etc.) of all the 
proteins that can be detected in a given cell type. Such in- 
tegrated 2-dimensiona! gel protein databases offer an easy 
and standardized medium in which to store and communi- 
cate protein information and provide a unique frame-work in 
which to focus a multidisciplinarv approach to study the cell. 
Once a protein is identified in the database, all of the infor- 
mation accumulated can be easily retrieved and made availa- 
ble to the researcher. In the long run. protein databases arc 
expected to foster a wide variety of biological information 
that may be instrumental to researchers working in many 
areas of biology— among others, cancer and oncogene 
studies, differentiation, development, drug development and 
testing, genetic variation, and diagnosis of genetic and clini- 
cal diseases (Fig. 1). 

The approach using systematic 2-dimensional gel protein 
analysis has recently gained a new dimension with the ad- 
vent of techniques to microsequence major proteins recorded 
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Figure 1. Interlace between partial protein sequence databases, 
comprehensive 2-dimensionai eel databases, and the human ee- 
nome sequencing project. Appropriate software is required to com- 
pare protein and DNA sequences. In general, although the infer- 
ence of a protein's sequence from the DNA sequence < thick arrow 1 
is direct and unambiguous, the DNA sequence can only be inferred 
lpproximaiely from the protein sequence (thin arrow t and cloning 
>f the gene requires either a cDNA or the requisite group of 
jiieonucleotide probes deduced from the partial amino acid se- 
uuence. Modified from re: 6. 



in the databases ('refs 24-42 and references therein). Partial 
protein sequences can be used to search for protein identity 
as well as to prepare specific DNA probes for cloning as- vet* 
uncharacterized proteins iFig. IV As these sequences can be 
stored in the database (see for example Fig. 2H), thev offer 
t unique opportunitv to link information on proteins with 
he existing or forthcoming DNA sequence data on the hu- 
man genome (Fig. 1) (20. "36. 39). 

Using the integrated approach offered by comprehensive 
2-dimensional gel databases (Fig. I), it will be possible to 
identify phenotype-specific proteins: microsequence them 
and store the information in the database: search for homol- 
ogy with previously characterized proteins: clone the 
cDNAs. assign partial protein sequences to genes for which 
the full DNA sequence and the chromosome location are 
known, and study the regulatory properties and function of 
groups of proteins (pathways, organelles, etc.) that are coor- 
dinated expressed in a given biological process. Comprehen- 
sive 2-dimensional gel protein databases will depict an in- 
tegrated picture of the expression levels and properties of the 
:housands of protein components of organelles, pathwavs. 
and cvtoskeleta! systems in both physiological and abnormal 
conditions and are expected to lead to identification of new 
regulatory networks in different cell tvpes and organisms. In 
the future. 2-dimensional gel protein databases may be 
linked to each other as well as to national and international 
specialized databanks on nucleic acid and protein sequences, 
protein structures. XMR experimental data, complex carbo- 
hvdraies. etc. 

A lew 2-dimensionai gel protein databases that are accessible 
m a computer form have been published in extenso: these 
correspond to the proiem-gene database of Escherichia cot: 
K-12 developed by Neidhardt and colleagues (14. 23). the rai 
REF 52 database established by Carrels and co-workers at 
Cold Spring Harbor (18. 22). and a few human databases 
nransiormed amnion cells [15. 20]. normal embrvonal lung 
MRC-5 fibroblasts [17. 21 j. kerannocvies [19] and peripheral 
blood mononuclear cells [15]) developed in Aarhus. Given 
space limitations and to keep this review in focus, we will 
concentrate on the computerized analysis of human cellular 
2-dimensional gel patterns, and in particular on the steps in- 
volved in establishing comprehensive 2-dimensional gel 
databases that can link protein and DNA information. 



MAKING AND MANAGING A COMPREH - - 
2-DIMENSIONAL GEL DATABASE OF HU\lV\ * 
CELLULAR PROTEINS 

The nrsi step in making a comprehensive 2-cimer.>;o::..: 
protein database is to prepare a svnihetie imace riicirr.: v: :: 
oi the gel image! of the gel (fluoroeram. Cooma$sie biur or -li- 
ver stained gel) to be used as a standard or master reiorcnn 
This can be done with laser scanners, charge couple devu c 
(CCD)- 1 arrav scanners, television cameras, roiauns: drum 
scanners, and muhiwire chambers tI3i. Computerized .inai* 
ysis systems for spot detection, quantitation, pattern match- 
ing, and data handling (access and retrieval of information, 
database making) have been described in the literature- 
(ELSIE [43]. GELLAB [11]. HERMeS |44:. MELANIl* 
[10]. QUEST (9), and TYCHO [8]) and some are available 
commercially (PDQUEST. Protein Database Inc.. Huntinc* 
ton. N.V.: KEPLER, Large Scale Biologv. Rockvilk-. Mil.. 
Visage. Biolmage Corporation. Ann Arbor. Mich.: Gemini. 
Joyce Loebl, Gateshead: Microscan 1000. Technoloo 
Resources Inc.. Nashville. Tenn. and MasterScan. Billcrica. 
Mass.). Unfortunately, most of these systems are incompati- 
ble with one another and their advantages and disadvantaee> 
have been discussed by Miller (13). 

In our work station in Aarhus. fiuorograms arc scanned 
with a Molecular Dynamics laser scanner and the data are 
analyzed using the PDQUEST II software (Protein Data- 
bases Inc.) (12) running on a spark station computer 4100 
FC-8-P3 from SUN Microsystems. Inc. The scanner meas- 
ures intensity in the range of 0-2.0 absorbance. A tvpical 
scan of a 17 x 17 cm fluorogram takes about 2 min. Steps 
m image analysis include: initial smoothing, background 
substraction. final smoothing, spot detection, and fitting of 
ideal Gaussian distribution to spot centers. Spot intensity is 
calculated as the integration of a fitted Gaussian. If calibra- 
tion strips containing individual segments of a known 
amount of radioactivity are used, it is possible to merge mul- 
tiple exposures of the sample image into a single data image 
of greater dynamic range. Once the synthetic image is 
created it can be stored on disk and displayed directlv on the 
monitor. Functions that can be used to edit the images in- 
clude: cancel (for example, to erase scratches that mav have 
been interpreted as spots by the computer: cancel streaks or 
low dpm spots), combine (sometimes a spot may be resolved 
into several closely packed spots), restore, uncombine, and 
add spot to the gel. The process is time consuming- about 
1-1/2 day per image. Edited standard images can be matched 
to other svnthetic images. Figure 2.4 shows a portion of a 
standard svnthetic image (IEF) of a fiuorogram of 
[ 35 S]methionine labeled cellular proteins from human AMA 
cells (master database) (20). Images can be displayed either 
in black and white (resembling the original Huorograms) or 
in color (other images in Fig. 2). depending on the^need. As 
shown in Fig. 2B, each polypeptide is assigned a number bv 
the computer, which facilitates the entry and retrieval of 
qualitative and quantitative information for anv given spot 
in the gel (20). The standard image can be matched auto- 
matically by the computer to other standard or reference gels 
(Fig. 2C. matching of AMA cellular proteins [left] to MRC-."> 
proteins [right]) provided a few landmark spots are given 
manually as reference ( indicated with a * in Fig. 2C) to in- 
itiate the process. 



Abbreviations: CCD. charee couple device: PCNA. proliferat- 
ing cell nuclear antigen: HPLC. hieh performance liquid chromatog- 
raphy. 




Figure .. H : Svntnetic image of a fraction of an IEF eel of the master .mace ol AMA cellular proteins H\ A> in .4 but shrm inc number* 
..--•ned to eacn spot. O Companion of AMA (left I and norma! human embryonal lun-.- MRCO fibroblast* . rich. , IEF proteins patterns 
.M..t<.ned proteins are indicated bv a - or by the same letters tn both eel* Once a protein is matched, inform;...,.,, contained in the various 
>..:vs:.jrie» available in the master AMA database can be transferred D\ Svnthet.i .mace 01 a traction of an IEF iiunrocram ol |»S)methii.- 
laoeied proteins trom normal human MRC-5 fibroblasts The histoeraim shrm level, of svnthes.s „l a i.v. pm,e.n< in MRCo (left 
:-.r_and SA 40 transformed MRC-5 (right bar) fibroblasts. £. Polypeptides that contain inlormation under the caie^orv clvcolvtic pnthwav 
t ; I tie lunct.on peruse annotation for spot allows the operator to inquire about ca.eeor.es and infonnai.on available for a even pro.ein 
(..Relative abundance of cvtoskeletal and cvtoskeletal-related proteins in quiescent, prol.ierntinc. and SV4(i.,ran»!orm«J MRC-i fibrob- 
.a.-t.- Hi Polvpeptides tnai contain inlormation under the cateeorv partial amino acid seoucntes 
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The automatic matching process that has been described 
in detail by Garrels et al. (13) takes about 5 min. Matched 
proteins are indicated with trie same letters in both gels (Fig. 
2C). The usefulness of this function is emphasized by the fact 
that data accumulated on common household 4 proteins can 
be easily transferred to any other human cellular cell type 
whose 2-dimensional gel ceilular protein pattern is matched 



to our standard AMA 2-dimensional gel protein image. Al- 
ternatively, if the standard gel is part of a matchset (set of 
gels in a given experiment) it can be used as a linker gel to 
compare, for example, the quantitative values of a given pro- 
tein throughout the experiment (see Fig. 2D; levels of some 
proteins in normal and SV40 transformed human MRC-5 
fibroblasts) or with other standard images in different sets of 



cross-matched experiments (J8, 22). 

Once a standard map of a given protein sample is made, 
one can enter qualitative annotations to make a reference 
database. Our master 2-dimensionaJ gel database of trans- 
formed human amnion cell (AMA) proteins (20) lists 3430 
polypeptides of which 2592 correspond to cellular compo- 
nents, having pis ranging from 4 to 13 and molecular 
weights between 8.5 and 230 kDa. The most abundant pro- 
teins in the database correspond to total actin (3.87% of total 
protein; about 90 million molecules per cell) while the 
lesser abundant of the recorded polypeptides are present in 
the vicinity of 5000 molecules per cell. Some annotation 
categories we are using to establish the master AMA data- 
base include: J) protein identification (comigration with 
purified proteins. 2-dimensionaJ immunoblotting, microse- 
quencing); 2) amounts (total amounts and levels of synthe- 
sis); 3) subcellular localization (nuclear, cytoskeletal, mem- 
brane, membrane receptors, specific organelles, etc.); 4) 
antibodies; 5) posttranslational modifications (phosphoryla- 
tion, glycosylation, methylation etc.); 6) microsequencing; 7) 
cell cycle specificity (specific variations in levels of synthesis 
and amount); 8) regulatory behavior (effect of hormones 
growth factors, heat shock, etc.) 9) rate of synthesis in nor- 
mal and transformed cells (proliferation sensitive proteins, 
cell cycle specific proteins, oncogenes, components of the 
pathway (or pathways) that control cell proliferation); 20) 
function (mainly from comigration with proteins of known 
function); //) sets of proteins that are coordinately regulated 
(hierarchy of controls, differential gene expression in various 
cells, etc.); 22) cDNAs (cloned cDNAs); 23) proteins that are 
specific to a given disease (systematic comparison of protein 
patterns of fibroblast proteins from healthy and diseased in- 
dividuals); 14) expression and exploitation of transfected 
cDNAs; 15) pathways (metabolic, others); 76) gene localization 
(genetic and physical); 27) effect of microinjected antibody 
on patterns of protein synthesis; and 28) secreted proteins. 

Information entered for any spot in a given annotation 
category can be easily retrieved by asking the computer to 
display the information on the color screen. For example 

xm 2E Sh ° WS a synthctic ima S c of a NEPHGE gel (master 
AMA database) displaying the information contained under 
the entry glycolytic pathway. Alternatively, one can use the 
function peruse annotations for spot to directly ask the com- 
puter to list all the entries available for a particular protein. 
By clicking the mouse in a given entry (in this case, presence 
in fetal human tissues) it is possible to take a quick look at 
the information in that particular entry (Fig. 2F). 

A major obstacle encountered in building comprehensive 
--dimensional gel protein databases is identifying the large 
number of proteins separated by this technology. In our 
databases (20, 21), known proteins are identified by one or 
a combination of the following procedures: /) comigration 
with known proteins, 2) 2-dimensional gel immunoblotting 
usmg specific antibodies, and 3) microsequencing of 
Coomassie Brillant Blue stained human proteins recovered 
irom dried 2-dimensional gels (see next section). Protein 
identification by means of microsequencing may be difficult 
as individual protein members of families with short peptide 
differences may escape detection. In the gene-protein data- 
base of £. colt K-12 (14, 23), another major 2-dimensional gel 
database available at present, proteins are being identified by 
a wider range of tests that include comigration with purified 
proteins; genetic criterion (deletion, insertion, frameshift 
nonsense, missense, regulatory), plasmid-bearing strains 
and in vitro synthesis of protein; selective labeling (methyla- 
tion, phosphorylation); peptide map similarity; and physio- 
logical criterion and selective derealization 



So far we have received nearlv 550 antibodies from 

i«2 £ Z r lhc worI f a ? d thcsc arf bcm? 

tested o> 2-dimensional gel immunoblotting for amice- c- 
termination. Similarly, purified proteins and oreanei^ 
provided by several laboratories have great] v aided ideminca. 
tion of unknown proteins (20721). We routinelv requesr anti- 
bodies and protein samples and promise the donors to maKt 
available all the information we mav have accumulated on that 
particular protein. For example, fable 1 lists entries availa- 
ble for Lipoconin V (IEF SSP 8216). also known as annexin 
V, VAC-a. endonexin IL renoconin. chromobindin-5'. an- 
ticoagulant protein. PAP-I, rcakimedin. IBC. calphobmdin. 
and anchorin CII. 

As mentioned previously one distinct advantage of 
2-dimensional gel electrophoresis is the possibilitv of study- 
ing quantitative variations in cellular protein patterns that 
may lead to identification of groups of proteins thai are ex- 
pressed coordinately during a given biological process. 
Quantitation, however, is not an easy task as reflected bv the 
lack of published data on global cellular protein patterns. We 
believe this is partly due to difficulties in obtaining sets of 
gels that are suitable for computer anaivsis (streaking, 
material remaining at the origin, etc.) as well as to limita- 
tions (laborious editing time, need of calibration strips to 
merge images, limited dynamic range, etc.) in the computer 
analysis systems available at the moment. Perhaps the most 
advanced quantitative studies published so far using com- 
puter analysis have been carried out by Garrels and co- 
workers (18, 22). In particular, these investigators have estab- 
lished a quantitative rat protein database (18, 22) designed 
to study growth control (proliferation, growth inhibitors, and 
stimulation) and transformation in well-defined groups of 
cell lines obtained by transformation of rat REF52 cells with 
SV40, adenovirus, and the Kirsten murine sarcoma virus. 
These studies have revealed clusters of proteins induced or 
repressed during growth to confluence as well as groups of 
transformation-sensitive proteins that respond in a differen- 
tial fashion to transformation by DNA and RNA viruses. A 
most interesting feature of this quantitative database is the 
discovery of a group of coregulated proteins that show simi- 
lar expression patterns as the cell cycle- regulated DNA repli- 

P roiein known as proliferating cell nuclear antigen 
(PCNA)/cyclin (45). 5 

In our human databases, most quantitations have been 
carried out by estimating the radioactivity contained in the 
polypeptides by direct counting of the gel pieces in a scintil- 
lation counter (20, 21). Up to 700 proteins can be cut out 
through appropriate exposed films in a period of time com- 
parable to that required for editing a synthetic image. 
Manual quantitation of this large number of spots is difficult 
without the assistance of a master reference image and a 
numbering system that can be used to identify the spots. Us- 
ing this approach, we have recorded quantitative changes in 
the relative abundance of 592 [ 35 S]methionine-labeled pro- 
teins synthesized by quiescent, proliferating, and SV40 
transformed human embryonic lung MRC-5 fibroblasts (21). 
Some data concerning cytoskeletal and cvtoskeletal-related 
proteins are presented in Fig. 2G. Our studies as well as 
those of Garrels and co-workers (18, 22) may in the long run 
help define patterns of gene expression that are characteristic 
of the transformed state. 

OTHER 2-DIMENSIONAL GEL PROTEIN 
DATABASES 



As mentioned previously there are other 2-dimensional gel 
databases available in computer form that have been pub- 
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TABLE 1 . Some entries for lipoconir. V iK the humar. AMA 2-dimensional gel protein database 



Kntno i'or hpocomr, \" iIEF S5P S2\6- 



Iniormaiion entered 



I Protein name 

J. Percentage of total protein 

v Apparent molecular weight tmn 

4. Isoelectric point (ph 

V Method tor methods* of identification 

r>. Credit to invesrieatoi -s ihai aided in 
identification 

" . Antibody against protein 

o. Cnmicration with human proteins 

u . Cellular localization 

! ' • . Calciu m * phospholipid-tlcpendcn t 
membrane proteins 

! I . Function 



IJ. Partial amino acid sequence 

\S cDNA sequence 

!4. I.evel> in fetal human tissues 



1). Levels in quiescent, proliicratini;. and 
transformed MRC-5 fibroblasts 

!•> Distribution in Triton supernatant and 
t vtnskelctonN 



Lipoconin V. renocortm. chromobindino . endonextn I. anucoacuiarv. prou-:: 
PAP-I. VaC-q. 35-vcaicimedin. *BC. calphobmdin I. anchonn CII. amu\:r: \ 
0.1107c (about 2.800.000 molecules per celH 

33.3 kDa 
4.76 

Microsequencing. 2-dimensionaJ immunobloning. Comscratinn 

G. Bauw. J. X'andekerckhove. and colleagues. Riiksuniv ersiteit Gent: H Pi T i:i>k\ 
BIOGEN. Cambridge; N.G. Ahn. University of \\ ashincton 

Polyclonal (rabbit, antibody no. 20). B. Pepinskv. BIOGEN. Carnbrui-e 

Lipoconin V.X.G. Ahn. Howard Hughes Medical Institute. Washington L'ium^hv 

Subcortical membrane 

Lipoconin V 

Regulation of various aspects of inflammation, immune response, bin,*) roatrui.ttion 
and differentiation 

GTYTDFPGFDER (7-18V VLTEIIASR (IOP-li:). QVYEEEYGSSLEDDWc; 
(127-143). ?GTDEEKFITIFGT(R) (187-201) 

Known. R. Blake et al../ Biol Chrm. 263. 10799-10811: 198S 
(pi » 4.76 from translated sequence) 

Adrenal glands - brain • - - » ; 

cerebellum - + + * ; ear • - * * ; eye - - . . ; 

hcan - * - » : hypophysis - + * ♦ ; liver - ~ * - ; 

lung - * * ; meninges - * - * ; 

mesonephric tissue - ♦ * ; 

striated muscle - * * : pancreas - - - * : 

skin - + - + : spleen - * * * ; stomach • * - * ; 

submandibular gland - * * ; 

small intestine - - * * ; thvmus - * * - ; 

thyroid gland - * * * ; tongue • - * - ; 

ureter - - - * 

Q (quiescent) « 1 . 1 ; P (proliferating* - 1.0: 
T (SV40 transformed) - 0.3 

Mainly supernatant 



lished in extenso: these correspond to the £. coli K-12 
proicm-gcne database (14. 23) and to the rat REF52 data- 

basr (18. 22). 

The £ coli K-12 cellular protein-gene database is perhaps 
:iu most complete of all databases reported so far and even- 
tually it should trace each protein back to its structural gene. 
Iniormaiion contained in this database includes: gene/pro- 
Ecm name (protein name. EC number, gene name); 
J-dimensional gel spot designations (x-y coordinates from 
rrlerence gels, alphanumeric designation): genetic iniorma- 
iion (linkage map location, physical map location. Genebank 
todc. sequence reference, location on Kohara clones); bi- 
ochemical information (molecular weight, pi. number of 
residues of each amino acid, mole percent of each amino 
acid, total number of amino acids in a polypeptide), and 
regulatory information (cellular level of protein in different 
media and different temperature, member of regulbn. mem- 
ber of stimulon). Major advances of this database are en- 
visaged in the future in view of the eminent sequencing of 



the whole E. coli genome as well as the development of im- 
proved methods to express cloned genes. 

The rat REF52 2-dimensional gel protein database lists 
about 1600 proteins that have been recorded using the 
QUEST analysis system (18, 22). Included in this quantita- 
tive database are /) protein names (cytoskeletal and heat 
shock proteins as well as various nuclear, mitochondrial, and 
cytoplasmic proteins), 2) annotations (subcellular localiza- 
tion, modification, recognition by specific antibodies, 
coprecipitation. NH : -terminal sequence, cross-reference to 
protein sequence information and references to the litera- 
ture), 3) protein sets (cytoskeletal proteins, phosphoproteins. 
sets of proteins with PCNA/cyclin-likc properties, etc.) and 
V) general quantitative data (protein synthesis during growth 
of normal REF52 cells to confluence and quiescence, and af- 
ter restimulation of growth- inhibited cells). 

In addition to the 2-dimensional gel databases mentioned 
so far there are several smaller cellular databases being es- 
tablished in human (normal human diploid fibroblasts. Ivm- 



phocytes. leukocytes, leukemic cells) mouse (NIH/3T3 cells. 
T lvmphocytes). Apixsic. yeasi (Saccharomyces ccmisae). plants 
(wheat, barley, sorghum), and Eugienc. Databases of tissue 
protein, (brain, whole mouse, liven and body fluid proteins 
iplasma proteins, cerebrospinal fluid, urine, and milk) are 
being established in several laboratories. The reader is 
directed to the review by Celis et al. (4) for details and refer- 
ences concerning these databases. 

MICROSEQUEXCIXG HAS ADDED A NEW 
DIMENSION TO COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASES: A DIRECT 
LINK BETWEEN PROTEINS AND GENES 

The development of highly sensitive amino acid gas-phase or 
liquid-phase sequenators (24). together with the establish- 
ment of efficient protein and peptide sample preparation 
methods, has opened the possibility to perform a systematic 
sequence analysis of proteins resolved by 2-dimensional gel 
electrophoresis. Indeed, generated pieces of protein se- 
quences can be used to search for protein identity (compari- 
son with available sequences stored in databanks) as well as 
for preparing specific DNA probes for cloning of as yet un- 
characterized proteins (Fig. 1). In addition, partial protein 
sequences can be stored in 2-dimensional gel databases (for 
example, see Fig. 2H) and offer a unique link between pro- 
teins and genes (Fig. 1). 

In the early 1970s gel electrophoresis was used to purify 
proteins for sequencing purposes (reviewed by Weber and 
Osborn in ref 25). Proteins were recovered by diffusion and 
sequenced by the manual dansyl-Edman degradation at the 
nanomole level. This technique was further refined by using 
electro-elution to recover proteins and by miniaturizing the 
system (26). This method has been used extensively, but 
showed increasing drawbacks (low yields, protein samples 
contaminated by free amino acids, and N ^-terminal block- 
ing) as the amounts of handled protein gradually became 
smaller (e.g.. at the 10 picomol level). 

Most of the problems referred to above have been 
minimized with the introduction of protein-electroblotting 
procedures (27-32). When proteins are blotted on chemi- 
cally inert membranes, it is possible to sequence the immobi- 
lized proteins directly without additional manipulations. 
Thus, depending on the amount of bound protein and its na- 
ture, this direct sequencing procedure generally yields NH?- 
terminal sequences containing 10-40 residues. As such, this 
technique was used to identify, by their NHj-terminal se- 
quences, differentially expressed major proteins from total 
cellular extracts separated on 2-dimensional gels. A major 
difficulty encountered in this procedure is the occurrence of 
frequent artefactual blockage of the proteins. Several studies 
suggest that this phenomenon is mainly due to reaction with 
contaminants (particularly unpolymerized acrylamide 
present in the gel) and to a high dilution of the protein (low- 
concentration of the protein per unit membrane surface). In 
addition to this primarily technical problem, many proteins 
are blocked in vivo by acylation or by a pyrrol i don carboxylic 
acid cap. 

The problem of partial or complete NH : -terminal block- 
age can be circumvented by generating internal amino acid 
sequences. This is achieved by fragmenting the protein 
present in the gel (gel in situ cleavage) or by cleaving it while 
bound to the membrane (membrane in situ cleavage) 
(33-35). In both cases, proteins are either cleaved in a res- 
tricted way (e.g.. by limited enzymatic digestion or by using 
restriction chemical cleavage conditions) or fragmented into 
smaller peptides. 



Of the different combinations examined, w t * h.ic <;•••. 
results by using exhaustive proieolvtic digestion 
membrane-immobilized proteins. This method ha> brr: 
described for Ponceau red-stained proteins on nitroceiiui^*- 
blots (34). for Amido-black*stained Immobilon-bounc vr 
teins. and for fluorescamine^detected proteins on class riLv 
membranes (35). The proteases used (trypsin. chymotrvp> :: . 
or pepsin) cleave at multiple sites, generating small pep;idt> 
that elute from the blot into the digestion buffer from which 
they are purified by reversed -phase high performance liquid 
chromatography (HPLC) before being sequenced individu- 
ally. Although each of these manipulations could be expected 
to result in a reduced yield of final sequence information, we 
were surprised that the peptides could be sequenced with 
high efficiency. In our hands, this approach could be rou- 
tinely applied to gel-purified proteins available in amount 
ranging from 5 to 10 jxg. and often yielded sequence informa- 
tion covering more than 309r of the total protein. A> 
membrane-immobilized proteins are not homogencouslv 
digested, but rather show protease sensitivity next to resis- 
tant regions, the number of peptides generated is much lower 
than expected from the number of potential cleavage sues. 
Consequently. HPLC peptide chromatograms are less com- 
plex and most peptides can be recovered in pure form. 

As only limited amounts of a protein mixture can be 
loaded on a 2-dimensional gel. proteins of interest are often 
obtained in yields insufficient for the currently available se- 
quencing technology. More material can be obtained by en- 
riching for a certain subcellular fraction (purified cell or- 
ganelles) or by exploiting affinity (dyes, metals, drugs, etc) or 
hydrophobic properties of proteins before gel analysis. All of 
the sequencing results accumulated so far in the human pro* 
tein database (20) (a few are shown in Fig. 2H) have been 
obtained from analysis of protein spots collected from 
2-dimensional gels that had been stained with Coomassie 
blue according to standard procedures and dried for storage. 
Proteins are recovered from the collected gel pieces by a 
protein-elution-concentration device, combined with gel 
electrophoresis and electroblotting. Detsfils of this technique 
have been reported in a previous communication (42) and a 
brief outline is given below. 

Combined gel pieces are allowed to swell in .gel sample 
buffer (a total volume of 1.5 ml). The gel pieces combined 
with the supernatant are then collected into a large slot made 
in a new gel. The slot is further filled with Sephadex G-10 
equilibrated in gel sample buffer. During consecutive gel 
electrophoresis, most of the electrical current passes on the 
side of the slot instead of passing through the slot. This 
results in both a vertical stacking and horizontal contraction 
of the protein band. With this device the protein is efficiently 
eluted from the gel pieces and concentrated from a large 
volume into a narrow* spot. The highly concentrated (about 
5 mm 2 ) protein spot is then electroblotted on PVDF- 
mcmbranes. stained with Amido black, and in situ digested 
with trypsin. The peptides generated during digestion elute 
from the membrane into the supernatant, and can be sepa* 
rated by narrow bore reversed-phase HPLC and collected in- 
dividually for sequence analysis. 

Using this and previous procedures (37, 39, 42), we have 
so far analyzed 70 protein spots collected from 
2-dimensional gels (20. and unpublished observations) (see 
for example Fig. 2H). The sequence information amounts to 
2100 allocated residues corresponding to an average of 30 
residues per protein spot. So far we have made cDNAs of 
many of the unknown proteins that have been microse- 
quenced. and a substantial number has been cloned and se- 
quenced. All available information indicates that it may be 
possible to obtain partial sequence information from most of 
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the proteins thai can be visualized by Coomassie Brillant 
Blue staining. 

Partial protein sequences are stored in the database as dis- 
played in Fie. 2H. and ii should be possible in the near fu- 
ture to interface this information with forthcoming DXA se- 
quence data from the human genome project. In the long 
run. as the human genome sequences become available it 
will be possible to assign partial protein sequences to genes 
:.^r which the full DNA sequence and chromosomal location 
.»rc known (Fig. 1). 

SUMMARY 

The studies presented in this brief review are intended to 
demonstrate the usefulness of computer-aided 2-dimensional 
gel electrophoresis and microsequencing to analyze cellular 
protein patterns, and to link protein and DNA information. 
\s more information is gathered worldwide, comprehensive 
iatabases will depict an integrated picture of the expression 
levels and properties of the thousands of proteins that orches- 
trate most cellular functions. 

Clearly, databases allow easy access to a large body of data 
and provide an efficient medium to communicate stan- 
dardized protein information In the future, databases will 
foster a wide variety of biological information that can be 
used to support collaborative research projects in basic and 
applied biology as well as in clinical research (2. 5. 46). Once 
;i protein is identified in a particular database all the infor- 
nation gathered on it can be made available to the scientist. 
However, many problems must be solved before protein 
databases become of general use to the scientific community. 
A most urgent one is to promote standardization of the gel 
running conditions so that data produced in a given labora- 
tory may be used worldwide. Surprisingly, the gel running 
technology as it stands today is still a craftmanship an. 

Finally, comprehensive, computerized databases of pro- 
teins, together with recently developed techniques to 
microsequence proteins, offer a new* dimension to the study 
oi genome organization and function (Fig. 1). In particular, 
human protein databases may become increasingly impor- 
tant in view of the concerted effort to map and sequence the 
entire human genome. This formidable task is expected to 
dominate biological research in the next decades. [^j 

\\c would like to thank S. Himmelstrup Joreensen lor typing the 
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1 Introduction 

Tumors may develop by a number of different mechan- 
isms in any given cell type. At the time of diagnosis, 
tumors will have progressed along different pathways to 
various stages of malignancy. To provide a basis for indi- 
vidual therapy it is of importance to examine specific 
properties of the tumor cell population in each patient. 
A large number of different markers have been de- 
scribed in order to increase the diagnostic accuracy. It is 
likely that a combination of serveral markers is needed 
in the future in order to reflect different properties of 
the tumor One important method for the resolution of a 
large number of potential markers is two-dimensional 
electrophoresis (2-DE). Extensive efforts are being made 
in identifying various polypeptides separated by 2-DE 
and to characterize how the expression of these polypep- 
tides is affected by the response to cellular transforma- 
tion and various culture conditions [1.2]. It would be of 
value to transfer this information to 2-DE separations of 
polypeptides from tumor tissue samples. However, one 
prerequisite is that the quality of the 2-DE gels from 
tumor samples is comparable in quality with 2-DE gels 
from samples of cultured cells. 

Frozen tumor tissues are commonly used for various bio- 
chemical assessments. However, if such samples are ana- 
lyzed by 2-D polyacrylamide gel electrophoresis (PAGE), 
the polypeptide patterns are obscured by contamination 
of serum- and connective , tissue proteins. Such nontu- 
mor-cell-related variations represent serious problems in 
the interpretation and inter-patient comparison of 2-DE 
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Abbre%iations: 2-DE. Two-dimensional polyacrylamide gel electro- 
phoresis: IEF. isoelectric focusing; LDH. lactate dehydrogenase; 
NP-40. Nonidet P-40: PBS. phosphate buffered saline: PCNA. prolife- 
rating cell nuclear aniieen: PIH. protease inhibitors: PMSF. phenvl- 
methyl sulibnyl fluoride; SDS. sodium dodccvl sulfate- WW. wet 
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patterns [3). 2-DE patterns of cells prepared from fresh 
tumor material were analyzed after enzymatic extraction 
of tumor cells [4. 5] or after culturing tumor fragments in 
medium containing radioactive amino acids [6]. These 
procedures may, however, lead to alterations in the gene 
expression/polypeptide patterns. We are only aware of 
one study where nonenzymatic extraction of cells from 
fresh tumor tissue (prostate cancer) was used to prepare 
samples for 2-D PAGE [4]. We have examined enzymatic 
extraction and various nonenzymatic preparation tech- 
niques, including fine needle aspiration, for the prepara- 
tion of cells from fresh tumor tissues. We describe 
nonenzymatic extraction procedures that are rapid, lead 
to high-quality 2-DE patterns, and that alleviate the 
necessity to purify tumor cell populations from dead 
cells. 

2 Materials and methods 

2.1 Cell cultures and samples used for spot 
identification 

A rat embryonal fibroblast cell line. WT2 (a kind gift 
from Dr. J. I. Garrets and Dr. S. Pattersson) was used for 
the identification of a number of heat shock and struc- 
tural proteins. Human normal diploid lung fibroblasts, 
WI38. human epithelial breast carcinoma cells, MDA- 
231 and MCF-7 were purchased from ATCC and grown 
as recommended. Polypeptides prepared from a leu- 
kemia type pre-B-ALL were separated by 2-DE. The 
2-DE map was then analyzed by Dr. S, M. Hanash (Uni- 
versity of Michigan. Ann Arbor, USA). 

2.2 Tumor tissues samples 

In this study. 2-DE maps from seven tumors were used 
as representative illustrations: two adenocarcinoma of 
the lung (LA, and LB, mucinous, both cases interme- 
diate grade of differentiation), one sqamous carcinoma 
of the lung (LS), one carcinoid-like breast cancer (BC), 
one microfollicular adenoma (highly differentiated) of 
the thyroid (TA), one highly differentiated hyperneph- 



Nonenzymatic extraction of cells from clinical tumor 
material for analysis of gene expression by two- 
dimensional polyacrylamide gel electrophoresis 

^e have compared different methods of preparation of malignant cells for 
two-dimensional electrophoresis (2-DE). We found all methods using fresh 
tissue to be superior compared to methods using frozen tissue. Our~results 
indicate that nonenzymatic methods of preparation of tumor ceils, including 
fine needle aspiration, scraping and squeezing, have advantages over methods 
using enzymatic extraction of cells. Nonenzymatic methods'are rapid, appear 
to reduce loss of high molecular protein species, and alleviate the necessitv of 
separating viable and nonviable cells by Percoll gradient centrifucation. Usine 
these techniques, high-quality 2-DE maps were derived from tumors of the 
lung and breast. In the resulting polypeptide patterns, heat shock proteins, 
non-muscle tropomyosins and intermediate filament were identified. We con- 
clude that nonenzymatic extraction of malignant cells from fresh tumor tissue 
improves the possibilities that these techniques mav be useful in clinical diag- 
nosis. 
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roma. a lumor of the kidney iKH). and finally one case 
of poorly differentiated corpus carcinoma (CP). 

2.3 Preparation of cultured cells 

The cell monolayers were washed twice in phosphate 
buffered saline (PBS) and then scraped off in ice-cold 
PBS including protease inhibitors (PIH). phenylmethyl- 
sulfonyl fluoride (PMSF) 0.2 mM and 0.83 ms\ benzami- 
dine pelleted at 660 X 3 min (*4°C) and washed one 
lime before final centrifugaiion at 2700 X 5 min. The 
wei weight of the cell pellet was recorded and the cells 
were stored at —80 C until further processing. 

2.4 Preparation of tumor tissue samples 

2.4.1 General remarks 

Macroscopically representative and non-necrotic tumor 
tissues were selected wiihin 20 min after resection. 
Parallel samples were routinely prepared for cytology. 
The samples were processed as rapidly as possible on ice 
or at -4 "C and in the presence of PIH. Cells were 
stained with DifTQuick (Baxter) and usually examined at 
three difTerent occasions during the preparation proce- 
dure: (i) cytology sample, (ii) extracted cells and (iii) 
cells after percoll gradient centrifugation. 

2.4.2 Specimen acquisition 

The strategy of sample preparation is shown in Fig. 1. 
Tumor tissue cell samples were usually obtained by fine 
needle aspiration (NA) using a 0.7 mm needle. The 
syringe was filled with 1-2 mL of ice-cold culture med- 
ium/PlH. We found that if a tumor appeared to be very 
fibrous it is difficult to extract enough cells for 2-DE 
analysis. In these cases, two alternative techniques were 
examined, (i) The tumor was cut in the middle and the 
fresh surface scraped (SC) by a scalpel. The cell-rich 
material was then transferred to ice-cold culture 
medium (L15 with 5°o fetal calf serum)/PIH. (ii) A pan 
of the tumor sample was placed in culture medium on 
ice for further processing at the laboratory in the fol- 
lowing way: the material was cut into very small frag- 
ments on a pre-cooled dissection plate and transferred 
to a small glass chamber with a 0.7 mm metal net 5 mm 
above the bottom of the chamber. Medium /PIH was 
added to cover the sample (8 mL) which was gently 
squeezed (SQ) towards the net in order to release and 
wash out cells. NA and SC were also compared with an 
enzymatic extraction (EE) procedure described previ- 
ously [5J: Briefly, thin slices of tissue were incubated 
with collagenase (] mg/mL) and elastase (2 mg/mL) in 
medium for 1 h at 37°C Extracted cells from even- 
sample were then subjected to percoll gradient centrifu- 
gation (Section 3.2.3). 

2.43 Separation of cells by Percoll gradient 
centrifugation 

The cell suspension was filtered through two nylon mesh 
fillers, (i) 250 \xm and (ii) 100 urn and then centrifuged 



at 660 X <> for 3 mm The eel! pelle: ^a> re>u>:*e": fc :..v 
carefully in medium, usini: synnce and loacee or.;** .. 
two-step discontinuous Pereoll/PBS crather.;. 2i - 
(density = 1.03 g/mL» and 5-.~ l - (densin = !.0" t'rr.L . 
and centrifuged at 1000 X ^ for 15 mm. In thi> swe:v.. 
dead cells sia> on the top. \ table cells sediment to me 
interphase and erythrocyte sediment to the bottom. Tr.e 
viability of cells in the tor fraction and interphase \\a> 
checked by the trypan blue exclusion test. The inter- 
phase cell layer i> ^0 L . w.ihihu ) uas collected and 
washed one time in ;i Urge \olume PBS /PIH t centri- 
fuged at 800 X jlt for 3 min). FinalK. the cell> were resu>- 
pended in 1.4 mL PBS and pelleted at 2*00 x - for 5 
min. The wet weight iWW'i was recorded and the pelle: 
was then stored at -80 C. 

2.4.4 Final preparation of cells for 2-D PAGE analysis 

From thib point, cultured cell samples were treated 
in the same way as tumor cell samples: Each cell pellet 
was thawed on ice and resusnended in l.S^ uL mQ water 
per mg WW" (= LS 1 ) X \V\Vi The suspension \\a> 
frozen and thawed 4-5 X to break the cells |7|. A 
volume of (0.0S l > X WW i U L 10**1. sodium dodecyl 
sulfate (SDS). including 35.3" mercapioeihanol. was 
mixed with the sample and incubated 5 min on ice with 
(0.329 X WW) U L of a solution of DNasc 1 (0.144 
mg/mL 20 msi Tri>-IICI with 2 m%i C\\C1 : X 211,0. pH 
8.8) and RNase A HUH 8 mg/mL Tris) The sample 
was frozen and lyophili/.ed. Sample buffer { 1 0] including 
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Fivurc I Experiment jl How chart sheming main steps of the prcpara- 
non procedures. The abbreviations used for noncnzymatic extraction 
procedures are: FZ: Irozen sample preparation: NA. needle aspira- 
tion: SC. scraped, and SO. squeezed sample. Extracted cells arc then 
loaded as a suspension nop volume of each tube) onto either 
1.07 g/mL Percoll (left), or a discontinuous Percoll gradient from the 
nonenzymatic extraction (middle), or from enzymatic extraction 
(right). Cellular top* and interphave fractions are then used for 2-DE. 
For details see Section 2. 



